The AMI Meeting Corpus

نویسنده

  • I. McCowan
چکیده

To support multi-disciplinary research in the AMI (Augmented Multi-party Interaction) project, a 100 hour corpus of meetings is being collected. This corpus is being recorded in several instrumented rooms equipped with a variety of microphones, video cameras, electronic pens, presentation slide capture and white-board capture devices. As well as real meetings, the corpus contains a significant proportion of scenario-driven meetings, which have been designed to elicit a rich range of realistic behaviors. To facilitate research, the raw data are being annotated at a number of levels including speech transcriptions, dialogue acts and summaries. The corpus is being distributed using a web server designed to allow convenient browsing and download of multimedia content and associated annotations. This article first overviews AMI research themes, then discusses corpus design, as well as data collection, annotation and distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Multiple Information Layers for the Automatic Generation of Indicative Meeting Abstracts

We describe a new application for NLG technology: the generation of indicative, abstractive summaries of multi-party meetings. Based on the freely available AMI corpus of 100 hours of recorded meetings, we are developing a summarizer that uses the rich annotations in the AMI corpus.

متن کامل

Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus

The AMI Meeting Corpus is now publicly available, including manual annotation files generated in the NXT XML format, but lacking explicit metadata for the 171 meetings of the corpus. To increase the usability of this important resource, a representation format based on relational databases is proposed, which maximizes informativeness, simplicity and reusability of the metadata and annotations. ...

متن کامل

Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus

Creating the AMI Meeting Corpus was an ambitious endeavour, probably more ambitious than the people who first thought of it realize even now. It contains 100 hours of meetings captured using a whole host of synchronized recording devices, and is designed to support work in speech and video processing, language engineering, corpus linguistics, and organizational psychology. It has been transcrib...

متن کامل

A Multimodal Corpus for Studying Dominance in Small Group Conversations

We present a new multimodal corpus with dominance annotations on small group conversations. We used five-minute non-overlapping slices from a subset of meetings selected from the popular Augmented Multi-party Interaction (AMI) corpus. The total length of the annotated corpus corresponds to 10 hours of meeting data. Each meeting is observed and assessed by three annotators according to their lev...

متن کامل

The AMI Meeting Corpus: A Pre-announcement

The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. It is being created in the context of a project that is developing meeting browsing technology and will eventually be released publicly. Some of the meetings it contains are naturally occurring, and some are elicited, particularly using a scenario in which the participants play different roles in a d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005